R Markdown

Lets first set some parameters

m <- 10000
nfoldsI <- 3

folds <- sample(1:nfoldsI, m, replace = TRUE)
adjustment_type <- "BH"

Now we create the betamix sample data according to Section 5.2 with two-dimensional covariates

mus_slope <- 1.5
prob_one_sided <- 0.25

Xs <- matrix(runif(m * 2, 0, 1), ncol = 2)
colnames(Xs) <- c("X1", "X2")

pi1s <- ifelse(Xs[, 1]^2 + Xs[, 2]^2 <= 1, 0.02, 0.4)
mus <- pmax(1.3, sqrt(Xs) %*% c(1, 1) * mus_slope)
mu_alphas <- 1 / mus

Hs <- stats::rbinom(m, size = 1, prob = pi1s)
Ps <- stats::runif(m) * (1 - Hs) + stats::rbeta(m, mu_alphas, 1) * Hs

To run regular IHW, we need to manually bin the covariates in 2d squares

bins1d <- seq(from = 0, to = 1, length.out = 6)

Xs_binned <- data.frame(
  X1 = cut(Xs[, 1], bins1d),
  X2 = cut(Xs[, 2], bins1d)
)
Xs_binned <- apply(Xs_binned, 1, function(row) {
  factor(paste(row[[1]], row[[2]], sep = "*"))
})

We run the normal IHW with the binned covariates

ihw_no_forest <- ihw(Ps, Xs_binned, alpha = .1, folds = folds, use_forest = FALSE, adjustment_type = "BH", lp_solver = "lpsymphony", lambda = Inf)

Now we run the IHW Forest with 100 trees. For the Boca Leek tree we need to provide the censoring parameter tau. The exact value of tau should not be too important. We will revisit this issue in a later stage of the project.

tau <- 0.7
ihw_forest <- ihw(Ps, Xs, alpha = .1, folds = folds, use_forest = TRUE, ntrees = 100, tau = tau, lp_solver = "lpsymphony", lambda = Inf)

Evaluating and comparing

Lets have a look at power and FDP. Note that as Nikos, we are only using one MonteCarlo replicate. So the FDP should be interpreted with caution. However, I am confident that I am not violating any assumptions of the theorems (took special care of that), so FDR control should be guaranteed from a theoretical point of view. We see that for this example, that the forest structure increases power considerably.

fdp_eval_no_forest <- fdp_eval(Hs, IHW::rejected_hypotheses(ihw_no_forest))
fdp_eval_forest <- fdp_eval(Hs, IHW::rejected_hypotheses(ihw_forest))
rbind(
  manually_cut = fdp_eval_no_forest,
  use_forest = fdp_eval_forest
)
##              rjs        pow        FDP FWER
## manually_cut  89 0.07948970 0.08988764 TRUE
## use_forest    94 0.08243376 0.10638298 TRUE

For completeness, let’s plot the weights in 2d as in Nikos’ draft of proof pdf

data <- data.frame(Xs, weight = ihw_no_forest@df[["weight"]])
ggplot(data = data, aes(x = X1, y = X2, color = weight)) +
  geom_point()

data <- data.frame(Xs, weight = ihw_forest@df[["weight"]])
ggplot(data = data, aes(x = X1, y = X2, color = weight)) +
  geom_point()

## Noise extra dimensions
Lets add some uninformative noise. If this still works, this would be extremely convenient for the end -user and disencourage active cheating. Furthermore, so many dimensions definitly necessitates the use regression trees.

noise_dim <- 5
noise <- runif(m*noise_dim)
noise <- matrix(noise, nrow = m, ncol = noise_dim)
Xs <- cbind(Xs, noise)       
ihw_forest_noisy <- ihw(Ps, Xs, alpha = .1, folds = folds, use_forest = TRUE, ntrees = 100, tau = tau, lp_solver = "lpsymphony", lambda = Inf) 
fdp_eval(Hs, IHW::rejected_hypotheses(ihw_forest_noisy))         
##   rjs       pow       FDP FWER
## 1 357 0.3022571 0.1372549 TRUE

But does it work better in univariate case?

Lets return to a simple, univariate example. The official referencemanual work with following data

Lets run IHW forest and the quantile slicing on this:

ihw_quantile_slicing <- ihw(Ps, Xs, alpha = .1, folds = folds, use_forest = FALSE, adjustment_type = "BH", lp_solver = "lpsymphony", lambda = Inf)
ihw_forest <- ihw(Ps, Xs, alpha = .1, folds = folds, use_forest = TRUE, ntrees = 100, tau = tau, lp_solver = "lpsymphony", lambda = Inf)
rbind(
  quantile_slicing = fdp_eval(Hs, IHW::rejected_hypotheses(ihw_quantile_slicing)),
  use_forest = fdp_eval(Hs, IHW::rejected_hypotheses(ihw_forest))
)
##                  rjs        pow       FDP FWER
## quantile_slicing  81 0.06913340 0.1234568 TRUE
## use_forest        25 0.02336904 0.0400000 TRUE

Conclusion

We need to do better benchmarking, with different models and more repkicates.